Overview

Brought to you by YData

Dataset statistics

 Dataset ADataset B
Number of variables1212
Number of observations446446
Missing cells442437
Missing cells (%)8.3%8.2%
Duplicate rows00
Duplicate rows (%)0.0%0.0%
Total size in memory45.3 KiB45.3 KiB
Average record size in memory104.0 B104.0 B

Variable types

 Dataset ADataset B
Numeric55
Categorical44
Text33

Alerts

Dataset ADataset B
Age has 93 (20.9%) missing valuesAge has 85 (19.1%) missing valuesMissing
Cabin has 348 (78.0%) missing valuesCabin has 351 (78.7%) missing valuesMissing
PassengerId has unique valuesPassengerId has unique valuesUnique
Name has unique valuesName has unique valuesUnique
SibSp has 310 (69.5%) zerosSibSp has 301 (67.5%) zerosZeros
Parch has 340 (76.2%) zerosParch has 335 (75.1%) zerosZeros
Fare has 6 (1.3%) zerosFare has 8 (1.8%) zerosZeros

Reproduction

 Dataset ADataset B
Analysis started2025-09-23 16:02:45.9656542025-09-23 16:02:48.141995
Analysis finished2025-09-23 16:02:48.1391232025-09-23 16:02:50.267723
Duration2.17 seconds2.13 seconds
Software versionydata-profiling v0.0.dev0ydata-profiling v0.0.dev0
Download configurationconfig.jsonconfig.json

Variables

PassengerId
Real number (ℝ)

 Dataset ADataset B
Distinct446446
Distinct (%)100.0%100.0%
Missing00
Missing (%)0.0%0.0%
Infinite00
Infinite (%)0.0%0.0%
Mean443.85874467.34978
 Dataset ADataset B
Minimum13
Maximum891890
Zeros00
Zeros (%)0.0%0.0%
Negative00
Negative (%)0.0%0.0%
Memory size7.0 KiB7.0 KiB
2025-09-23T16:02:50.366419image/svg+xmlMatplotlib v3.10.0, https://matplotlib.org/

Quantile statistics

 Dataset ADataset B
Minimum13
5-th percentile50.551.25
Q1211.25250.25
median447.5481.5
Q3668.5683.5
95-th percentile851.5845.75
Maximum891890
Range890887
Interquartile range (IQR)457.25433.25

Descriptive statistics

 Dataset ADataset B
Standard deviation258.60845254.54825
Coefficient of variation (CV)0.582636840.54466326
Kurtosis-1.2003823-1.1902288
Mean443.85874467.34978
Median Absolute Deviation (MAD)228216
Skewness0.034350528-0.10317086
Sum197961208438
Variance66878.33364794.812
MonotonicityNot monotonicNot monotonic
2025-09-23T16:02:50.506043image/svg+xmlMatplotlib v3.10.0, https://matplotlib.org/
Histogram with fixed size bins (bins=50)
ValueCountFrequency (%)
1241
 
0.2%
4771
 
0.2%
5691
 
0.2%
6421
 
0.2%
5231
 
0.2%
5461
 
0.2%
2591
 
0.2%
1111
 
0.2%
5301
 
0.2%
991
 
0.2%
Other values (436)436
97.8%
ValueCountFrequency (%)
4991
 
0.2%
7921
 
0.2%
1921
 
0.2%
1571
 
0.2%
5821
 
0.2%
3441
 
0.2%
301
 
0.2%
8871
 
0.2%
7481
 
0.2%
991
 
0.2%
Other values (436)436
97.8%
ValueCountFrequency (%)
11
0.2%
21
0.2%
41
0.2%
71
0.2%
91
0.2%
101
0.2%
121
0.2%
131
0.2%
161
0.2%
191
0.2%
ValueCountFrequency (%)
31
0.2%
51
0.2%
81
0.2%
91
0.2%
101
0.2%
141
0.2%
161
0.2%
221
0.2%
231
0.2%
241
0.2%
ValueCountFrequency (%)
31
0.2%
51
0.2%
81
0.2%
91
0.2%
101
0.2%
141
0.2%
161
0.2%
221
0.2%
231
0.2%
241
0.2%
ValueCountFrequency (%)
11
0.2%
21
0.2%
41
0.2%
71
0.2%
91
0.2%
101
0.2%
121
0.2%
131
0.2%
161
0.2%
191
0.2%

Survived
Categorical

 Dataset ADataset B
Distinct22
Distinct (%)0.4%0.4%
Missing00
Missing (%)0.0%0.0%
Memory size7.0 KiB7.0 KiB
0
273 
1
173 
0
278 
1
168 

Length

 Dataset ADataset B
Max length11
Median length11
Mean length11
Min length11

Characters and Unicode

 Dataset ADataset B
Total characters446446
Distinct characters22
Distinct categories11 ?
Distinct scripts11 ?
Distinct blocks11 ?
The Unicode Standard assigns character properties to each code point, which can be used to analyse textual variables.

Unique

 Dataset ADataset B
Unique00 ?
Unique (%)0.0%0.0%

Sample

 Dataset ADataset B
1st row00
2nd row00
3rd row11
4th row01
5th row00

Common Values

ValueCountFrequency (%)
0273
61.2%
1173
38.8%
ValueCountFrequency (%)
0278
62.3%
1168
37.7%

Length

2025-09-23T16:02:50.604094image/svg+xmlMatplotlib v3.10.0, https://matplotlib.org/
Histogram of lengths of the category

Common Values (Plot)

Dataset A

2025-09-23T16:02:50.650066image/svg+xmlMatplotlib v3.10.0, https://matplotlib.org/

Dataset B

2025-09-23T16:02:50.681998image/svg+xmlMatplotlib v3.10.0, https://matplotlib.org/
ValueCountFrequency (%)
0273
61.2%
1173
38.8%
ValueCountFrequency (%)
0278
62.3%
1168
37.7%

Most occurring characters

ValueCountFrequency (%)
0273
61.2%
1173
38.8%
ValueCountFrequency (%)
0278
62.3%
1168
37.7%

Most occurring categories

ValueCountFrequency (%)
(unknown)446
100.0%
ValueCountFrequency (%)
(unknown)446
100.0%

Most frequent character per category

(unknown)
ValueCountFrequency (%)
0273
61.2%
1173
38.8%
ValueCountFrequency (%)
0278
62.3%
1168
37.7%

Most occurring scripts

ValueCountFrequency (%)
(unknown)446
100.0%
ValueCountFrequency (%)
(unknown)446
100.0%

Most frequent character per script

(unknown)
ValueCountFrequency (%)
0273
61.2%
1173
38.8%
ValueCountFrequency (%)
0278
62.3%
1168
37.7%

Most occurring blocks

ValueCountFrequency (%)
(unknown)446
100.0%
ValueCountFrequency (%)
(unknown)446
100.0%

Most frequent character per block

(unknown)
ValueCountFrequency (%)
0273
61.2%
1173
38.8%
ValueCountFrequency (%)
0278
62.3%
1168
37.7%

Pclass
Categorical

 Dataset ADataset B
Distinct33
Distinct (%)0.7%0.7%
Missing00
Missing (%)0.0%0.0%
Memory size7.0 KiB7.0 KiB
3
252 
1
105 
2
89 
3
244 
2
101 
1
101 

Length

 Dataset ADataset B
Max length11
Median length11
Mean length11
Min length11

Characters and Unicode

 Dataset ADataset B
Total characters446446
Distinct characters33
Distinct categories11 ?
Distinct scripts11 ?
Distinct blocks11 ?
The Unicode Standard assigns character properties to each code point, which can be used to analyse textual variables.

Unique

 Dataset ADataset B
Unique00 ?
Unique (%)0.0%0.0%

Sample

 Dataset ADataset B
1st row22
2nd row32
3rd row13
4th row31
5th row12

Common Values

ValueCountFrequency (%)
3252
56.5%
1105
23.5%
289
 
20.0%
ValueCountFrequency (%)
3244
54.7%
2101
22.6%
1101
22.6%

Length

2025-09-23T16:02:50.734904image/svg+xmlMatplotlib v3.10.0, https://matplotlib.org/
Histogram of lengths of the category

Common Values (Plot)

Dataset A

2025-09-23T16:02:50.783521image/svg+xmlMatplotlib v3.10.0, https://matplotlib.org/

Dataset B

2025-09-23T16:02:50.823515image/svg+xmlMatplotlib v3.10.0, https://matplotlib.org/
ValueCountFrequency (%)
3252
56.5%
1105
23.5%
289
 
20.0%
ValueCountFrequency (%)
3244
54.7%
2101
22.6%
1101
22.6%

Most occurring characters

ValueCountFrequency (%)
3252
56.5%
1105
23.5%
289
 
20.0%
ValueCountFrequency (%)
3244
54.7%
2101
22.6%
1101
22.6%

Most occurring categories

ValueCountFrequency (%)
(unknown)446
100.0%
ValueCountFrequency (%)
(unknown)446
100.0%

Most frequent character per category

(unknown)
ValueCountFrequency (%)
3252
56.5%
1105
23.5%
289
 
20.0%
ValueCountFrequency (%)
3244
54.7%
2101
22.6%
1101
22.6%

Most occurring scripts

ValueCountFrequency (%)
(unknown)446
100.0%
ValueCountFrequency (%)
(unknown)446
100.0%

Most frequent character per script

(unknown)
ValueCountFrequency (%)
3252
56.5%
1105
23.5%
289
 
20.0%
ValueCountFrequency (%)
3244
54.7%
2101
22.6%
1101
22.6%

Most occurring blocks

ValueCountFrequency (%)
(unknown)446
100.0%
ValueCountFrequency (%)
(unknown)446
100.0%

Most frequent character per block

(unknown)
ValueCountFrequency (%)
3252
56.5%
1105
23.5%
289
 
20.0%
ValueCountFrequency (%)
3244
54.7%
2101
22.6%
1101
22.6%

Name
['Text', 'Text']

 Dataset ADataset B
Distinct446446
Distinct (%)100.0%100.0%
Missing00
Missing (%)0.0%0.0%
Memory size7.0 KiB7.0 KiB
2025-09-23T16:02:51.110847image/svg+xmlMatplotlib v3.10.0, https://matplotlib.org/

Length

 Dataset ADataset B
Max length8267
Median length4948
Mean length26.79820627.109865
Min length1312

Characters and Unicode

 Dataset ADataset B
Total characters1195212091
Distinct characters6059
Distinct categories11 ?
Distinct scripts11 ?
Distinct blocks11 ?
The Unicode Standard assigns character properties to each code point, which can be used to analyse textual variables.

Unique

 Dataset ADataset B
Unique446446 ?
Unique (%)100.0%100.0%

Sample

 Dataset ADataset B
1st rowRenouf, Mr. Peter HenryGaskell, Mr. Alfred
2nd rowDoharr, Mr. TannousCarbines, Mr. William
3rd rowSagesser, Mlle. EmmaGilnagh, Miss. Katherine "Katie"
4th rowLahoud, Mr. SarkisThayer, Mrs. John Borland (Marian Longstreth Morris)
5th rowNicholson, Mr. Arthur ErnestSedgwick, Mr. Charles Frederick Waddington
ValueCountFrequency (%)
mr258
 
14.3%
miss91
 
5.0%
mrs68
 
3.8%
william37
 
2.1%
john24
 
1.3%
henry20
 
1.1%
master18
 
1.0%
thomas12
 
0.7%
george12
 
0.7%
mary11
 
0.6%
Other values (891)1251
69.4%
ValueCountFrequency (%)
mr261
 
14.4%
miss88
 
4.8%
mrs64
 
3.5%
william33
 
1.8%
master25
 
1.4%
john18
 
1.0%
charles15
 
0.8%
thomas15
 
0.8%
henry14
 
0.8%
george11
 
0.6%
Other values (913)1272
70.0%
2025-09-23T16:02:51.566766image/svg+xmlMatplotlib v3.10.0, https://matplotlib.org/

Most occurring characters

ValueCountFrequency (%)
1357
 
11.4%
r957
 
8.0%
e848
 
7.1%
a826
 
6.9%
n659
 
5.5%
i659
 
5.5%
s626
 
5.2%
M569
 
4.8%
l530
 
4.4%
o491
 
4.1%
Other values (50)4430
37.1%
ValueCountFrequency (%)
1371
 
11.3%
r964
 
8.0%
a839
 
6.9%
e837
 
6.9%
i699
 
5.8%
s646
 
5.3%
n635
 
5.3%
l557
 
4.6%
M556
 
4.6%
o487
 
4.0%
Other values (49)4500
37.2%

Most occurring categories

ValueCountFrequency (%)
(unknown)11952
100.0%
ValueCountFrequency (%)
(unknown)12091
100.0%

Most frequent character per category

(unknown)
ValueCountFrequency (%)
1357
 
11.4%
r957
 
8.0%
e848
 
7.1%
a826
 
6.9%
n659
 
5.5%
i659
 
5.5%
s626
 
5.2%
M569
 
4.8%
l530
 
4.4%
o491
 
4.1%
Other values (50)4430
37.1%
ValueCountFrequency (%)
1371
 
11.3%
r964
 
8.0%
a839
 
6.9%
e837
 
6.9%
i699
 
5.8%
s646
 
5.3%
n635
 
5.3%
l557
 
4.6%
M556
 
4.6%
o487
 
4.0%
Other values (49)4500
37.2%

Most occurring scripts

ValueCountFrequency (%)
(unknown)11952
100.0%
ValueCountFrequency (%)
(unknown)12091
100.0%

Most frequent character per script

(unknown)
ValueCountFrequency (%)
1357
 
11.4%
r957
 
8.0%
e848
 
7.1%
a826
 
6.9%
n659
 
5.5%
i659
 
5.5%
s626
 
5.2%
M569
 
4.8%
l530
 
4.4%
o491
 
4.1%
Other values (50)4430
37.1%
ValueCountFrequency (%)
1371
 
11.3%
r964
 
8.0%
a839
 
6.9%
e837
 
6.9%
i699
 
5.8%
s646
 
5.3%
n635
 
5.3%
l557
 
4.6%
M556
 
4.6%
o487
 
4.0%
Other values (49)4500
37.2%

Most occurring blocks

ValueCountFrequency (%)
(unknown)11952
100.0%
ValueCountFrequency (%)
(unknown)12091
100.0%

Most frequent character per block

(unknown)
ValueCountFrequency (%)
1357
 
11.4%
r957
 
8.0%
e848
 
7.1%
a826
 
6.9%
n659
 
5.5%
i659
 
5.5%
s626
 
5.2%
M569
 
4.8%
l530
 
4.4%
o491
 
4.1%
Other values (50)4430
37.1%
ValueCountFrequency (%)
1371
 
11.3%
r964
 
8.0%
a839
 
6.9%
e837
 
6.9%
i699
 
5.8%
s646
 
5.3%
n635
 
5.3%
l557
 
4.6%
M556
 
4.6%
o487
 
4.0%
Other values (49)4500
37.2%

Sex
Categorical

 Dataset ADataset B
Distinct22
Distinct (%)0.4%0.4%
Missing00
Missing (%)0.0%0.0%
Memory size7.0 KiB7.0 KiB
male
286 
female
160 
male
293 
female
153 

Length

 Dataset ADataset B
Max length66
Median length44
Mean length4.71748884.6860987
Min length44

Characters and Unicode

 Dataset ADataset B
Total characters21042090
Distinct characters55
Distinct categories11 ?
Distinct scripts11 ?
Distinct blocks11 ?
The Unicode Standard assigns character properties to each code point, which can be used to analyse textual variables.

Unique

 Dataset ADataset B
Unique00 ?
Unique (%)0.0%0.0%

Sample

 Dataset ADataset B
1st rowmalemale
2nd rowmalemale
3rd rowfemalefemale
4th rowmalefemale
5th rowmalemale

Common Values

ValueCountFrequency (%)
male286
64.1%
female160
35.9%
ValueCountFrequency (%)
male293
65.7%
female153
34.3%

Length

2025-09-23T16:02:51.655849image/svg+xmlMatplotlib v3.10.0, https://matplotlib.org/
Histogram of lengths of the category

Common Values (Plot)

Dataset A

2025-09-23T16:02:51.708130image/svg+xmlMatplotlib v3.10.0, https://matplotlib.org/

Dataset B

2025-09-23T16:02:51.741987image/svg+xmlMatplotlib v3.10.0, https://matplotlib.org/
ValueCountFrequency (%)
male286
64.1%
female160
35.9%
ValueCountFrequency (%)
male293
65.7%
female153
34.3%

Most occurring characters

ValueCountFrequency (%)
e606
28.8%
m446
21.2%
a446
21.2%
l446
21.2%
f160
 
7.6%
ValueCountFrequency (%)
e599
28.7%
m446
21.3%
a446
21.3%
l446
21.3%
f153
 
7.3%

Most occurring categories

ValueCountFrequency (%)
(unknown)2104
100.0%
ValueCountFrequency (%)
(unknown)2090
100.0%

Most frequent character per category

(unknown)
ValueCountFrequency (%)
e606
28.8%
m446
21.2%
a446
21.2%
l446
21.2%
f160
 
7.6%
ValueCountFrequency (%)
e599
28.7%
m446
21.3%
a446
21.3%
l446
21.3%
f153
 
7.3%

Most occurring scripts

ValueCountFrequency (%)
(unknown)2104
100.0%
ValueCountFrequency (%)
(unknown)2090
100.0%

Most frequent character per script

(unknown)
ValueCountFrequency (%)
e606
28.8%
m446
21.2%
a446
21.2%
l446
21.2%
f160
 
7.6%
ValueCountFrequency (%)
e599
28.7%
m446
21.3%
a446
21.3%
l446
21.3%
f153
 
7.3%

Most occurring blocks

ValueCountFrequency (%)
(unknown)2104
100.0%
ValueCountFrequency (%)
(unknown)2090
100.0%

Most frequent character per block

(unknown)
ValueCountFrequency (%)
e606
28.8%
m446
21.2%
a446
21.2%
l446
21.2%
f160
 
7.6%
ValueCountFrequency (%)
e599
28.7%
m446
21.3%
a446
21.3%
l446
21.3%
f153
 
7.3%

Age
Real number (ℝ)

 Dataset ADataset B
Distinct8179
Distinct (%)22.9%21.9%
Missing9385
Missing (%)20.9%19.1%
Infinite00
Infinite (%)0.0%0.0%
Mean30.45917828.614044
 Dataset ADataset B
Minimum0.420.42
Maximum8080
Zeros00
Zeros (%)0.0%0.0%
Negative00
Negative (%)0.0%0.0%
Memory size7.0 KiB7.0 KiB
2025-09-23T16:02:51.971645image/svg+xmlMatplotlib v3.10.0, https://matplotlib.org/

Quantile statistics

 Dataset ADataset B
Minimum0.420.42
5-th percentile43
Q12119
median2927
Q33936
95-th percentile59.456
Maximum8080
Range79.5879.58
Interquartile range (IQR)1817

Descriptive statistics

 Dataset ADataset B
Standard deviation15.11101614.972679
Coefficient of variation (CV)0.496107160.52326329
Kurtosis0.27189430.46267955
Mean30.45917828.614044
Median Absolute Deviation (MAD)98
Skewness0.479753770.4756362
Sum10752.0910329.67
Variance228.34282224.18112
MonotonicityNot monotonicNot monotonic
2025-09-23T16:02:52.112461image/svg+xmlMatplotlib v3.10.0, https://matplotlib.org/
Histogram with fixed size bins (bins=50)
ValueCountFrequency (%)
2215
 
3.4%
1814
 
3.1%
2814
 
3.1%
2414
 
3.1%
3013
 
2.9%
1913
 
2.9%
3212
 
2.7%
1611
 
2.5%
3611
 
2.5%
2611
 
2.5%
Other values (71)225
50.4%
(Missing)93
20.9%
ValueCountFrequency (%)
3016
 
3.6%
1915
 
3.4%
2714
 
3.1%
2513
 
2.9%
2613
 
2.9%
3613
 
2.9%
1812
 
2.7%
2811
 
2.5%
2911
 
2.5%
2411
 
2.5%
Other values (69)232
52.0%
(Missing)85
 
19.1%
ValueCountFrequency (%)
0.421
 
0.2%
0.671
 
0.2%
0.752
 
0.4%
13
0.7%
25
1.1%
33
0.7%
44
0.9%
51
 
0.2%
61
 
0.2%
71
 
0.2%
ValueCountFrequency (%)
0.421
 
0.2%
0.752
 
0.4%
0.831
 
0.2%
0.921
 
0.2%
14
0.9%
28
1.8%
34
0.9%
46
1.3%
53
 
0.7%
73
 
0.7%
ValueCountFrequency (%)
0.421
 
0.2%
0.752
 
0.4%
0.831
 
0.2%
0.921
 
0.2%
14
0.9%
28
1.8%
34
0.9%
46
1.3%
53
 
0.7%
73
 
0.7%
ValueCountFrequency (%)
0.421
 
0.2%
0.671
 
0.2%
0.752
 
0.4%
13
0.7%
25
1.1%
33
0.7%
44
0.9%
51
 
0.2%
61
 
0.2%
71
 
0.2%

SibSp
Real number (ℝ)

 Dataset ADataset B
Distinct77
Distinct (%)1.6%1.6%
Missing00
Missing (%)0.0%0.0%
Infinite00
Infinite (%)0.0%0.0%
Mean0.49551570.54932735
 Dataset ADataset B
Minimum00
Maximum88
Zeros310301
Zeros (%)69.5%67.5%
Negative00
Negative (%)0.0%0.0%
Memory size7.0 KiB7.0 KiB
2025-09-23T16:02:52.204218image/svg+xmlMatplotlib v3.10.0, https://matplotlib.org/

Quantile statistics

 Dataset ADataset B
Minimum00
5-th percentile00
Q100
median00
Q311
95-th percentile23
Maximum88
Range88
Interquartile range (IQR)11

Descriptive statistics

 Dataset ADataset B
Standard deviation1.05945821.1202074
Coefficient of variation (CV)2.13809222.0392348
Kurtosis19.02347714.967694
Mean0.49551570.54932735
Median Absolute Deviation (MAD)00
Skewness3.77022813.3831567
Sum221245
Variance1.12245181.2548647
MonotonicityNot monotonicNot monotonic
2025-09-23T16:02:52.270503image/svg+xmlMatplotlib v3.10.0, https://matplotlib.org/
Histogram with fixed size bins (bins=7)
ValueCountFrequency (%)
0310
69.5%
199
 
22.2%
217
 
3.8%
47
 
1.6%
37
 
1.6%
53
 
0.7%
83
 
0.7%
ValueCountFrequency (%)
0301
67.5%
1104
 
23.3%
214
 
3.1%
411
 
2.5%
310
 
2.2%
53
 
0.7%
83
 
0.7%
ValueCountFrequency (%)
0310
69.5%
199
 
22.2%
217
 
3.8%
37
 
1.6%
47
 
1.6%
53
 
0.7%
83
 
0.7%
ValueCountFrequency (%)
0301
67.5%
1104
 
23.3%
214
 
3.1%
310
 
2.2%
411
 
2.5%
53
 
0.7%
83
 
0.7%
ValueCountFrequency (%)
0301
67.5%
1104
 
23.3%
214
 
3.1%
310
 
2.2%
411
 
2.5%
53
 
0.7%
83
 
0.7%
ValueCountFrequency (%)
0310
69.5%
199
 
22.2%
217
 
3.8%
37
 
1.6%
47
 
1.6%
53
 
0.7%
83
 
0.7%

Parch
Real number (ℝ)

 Dataset ADataset B
Distinct77
Distinct (%)1.6%1.6%
Missing00
Missing (%)0.0%0.0%
Infinite00
Infinite (%)0.0%0.0%
Mean0.392376680.39237668
 Dataset ADataset B
Minimum00
Maximum66
Zeros340335
Zeros (%)76.2%75.1%
Negative00
Negative (%)0.0%0.0%
Memory size7.0 KiB7.0 KiB
2025-09-23T16:02:52.332308image/svg+xmlMatplotlib v3.10.0, https://matplotlib.org/

Quantile statistics

 Dataset ADataset B
Minimum00
5-th percentile00
Q100
median00
Q300
95-th percentile22
Maximum66
Range66
Interquartile range (IQR)00

Descriptive statistics

 Dataset ADataset B
Standard deviation0.851085340.80494829
Coefficient of variation (CV)2.16905182.0514682
Kurtosis10.94978210.338209
Mean0.392376680.39237668
Median Absolute Deviation (MAD)00
Skewness2.93159052.7294591
Sum175175
Variance0.724346250.64794175
MonotonicityNot monotonicNot monotonic
2025-09-23T16:02:52.398457image/svg+xmlMatplotlib v3.10.0, https://matplotlib.org/
Histogram with fixed size bins (bins=7)
ValueCountFrequency (%)
0340
76.2%
158
 
13.0%
239
 
8.7%
53
 
0.7%
43
 
0.7%
32
 
0.4%
61
 
0.2%
ValueCountFrequency (%)
0335
75.1%
161
 
13.7%
244
 
9.9%
32
 
0.4%
52
 
0.4%
41
 
0.2%
61
 
0.2%
ValueCountFrequency (%)
0340
76.2%
158
 
13.0%
239
 
8.7%
32
 
0.4%
43
 
0.7%
53
 
0.7%
61
 
0.2%
ValueCountFrequency (%)
0335
75.1%
161
 
13.7%
244
 
9.9%
32
 
0.4%
41
 
0.2%
52
 
0.4%
61
 
0.2%
ValueCountFrequency (%)
0335
75.1%
161
 
13.7%
244
 
9.9%
32
 
0.4%
41
 
0.2%
52
 
0.4%
61
 
0.2%
ValueCountFrequency (%)
0340
76.2%
158
 
13.0%
239
 
8.7%
32
 
0.4%
43
 
0.7%
53
 
0.7%
61
 
0.2%

Ticket
['Text', 'Text']

 Dataset ADataset B
Distinct383380
Distinct (%)85.9%85.2%
Missing00
Missing (%)0.0%0.0%
Memory size7.0 KiB7.0 KiB
2025-09-23T16:02:52.844815image/svg+xmlMatplotlib v3.10.0, https://matplotlib.org/

Length

 Dataset ADataset B
Max length1818
Median length1717
Mean length6.79596416.8318386
Min length34

Characters and Unicode

 Dataset ADataset B
Total characters30313047
Distinct characters3135
Distinct categories11 ?
Distinct scripts11 ?
Distinct blocks11 ?
The Unicode Standard assigns character properties to each code point, which can be used to analyse textual variables.

Unique

 Dataset ADataset B
Unique334333 ?
Unique (%)74.9%74.7%

Sample

 Dataset ADataset B
1st row31027239865
2nd row268628424
3rd rowPC 1747735851
4th row262417421
5th row693244361
ValueCountFrequency (%)
pc28
 
5.0%
c.a16
 
2.8%
a/59
 
1.6%
ca7
 
1.2%
w./c7
 
1.2%
soton/o.q7
 
1.2%
ston/o6
 
1.1%
26
 
1.1%
16014
 
0.7%
21444
 
0.7%
Other values (401)471
83.4%
ValueCountFrequency (%)
pc27
 
4.7%
c.a14
 
2.5%
a/58
 
1.4%
ca8
 
1.4%
26
 
1.1%
ston/o6
 
1.1%
w./c5
 
0.9%
sc/paris5
 
0.9%
16015
 
0.9%
3470825
 
0.9%
Other values (402)481
84.4%
2025-09-23T16:02:53.392415image/svg+xmlMatplotlib v3.10.0, https://matplotlib.org/

Most occurring characters

ValueCountFrequency (%)
3372
12.3%
1338
11.2%
2296
9.8%
7247
 
8.1%
4240
 
7.9%
6221
 
7.3%
0201
 
6.6%
5195
 
6.4%
9155
 
5.1%
8146
 
4.8%
Other values (21)620
20.5%
ValueCountFrequency (%)
3377
12.4%
1350
11.5%
2306
10.0%
7229
 
7.5%
4225
 
7.4%
6213
 
7.0%
5205
 
6.7%
0194
 
6.4%
9163
 
5.3%
8145
 
4.8%
Other values (25)640
21.0%

Most occurring categories

ValueCountFrequency (%)
(unknown)3031
100.0%
ValueCountFrequency (%)
(unknown)3047
100.0%

Most frequent character per category

(unknown)
ValueCountFrequency (%)
3372
12.3%
1338
11.2%
2296
9.8%
7247
 
8.1%
4240
 
7.9%
6221
 
7.3%
0201
 
6.6%
5195
 
6.4%
9155
 
5.1%
8146
 
4.8%
Other values (21)620
20.5%
ValueCountFrequency (%)
3377
12.4%
1350
11.5%
2306
10.0%
7229
 
7.5%
4225
 
7.4%
6213
 
7.0%
5205
 
6.7%
0194
 
6.4%
9163
 
5.3%
8145
 
4.8%
Other values (25)640
21.0%

Most occurring scripts

ValueCountFrequency (%)
(unknown)3031
100.0%
ValueCountFrequency (%)
(unknown)3047
100.0%

Most frequent character per script

(unknown)
ValueCountFrequency (%)
3372
12.3%
1338
11.2%
2296
9.8%
7247
 
8.1%
4240
 
7.9%
6221
 
7.3%
0201
 
6.6%
5195
 
6.4%
9155
 
5.1%
8146
 
4.8%
Other values (21)620
20.5%
ValueCountFrequency (%)
3377
12.4%
1350
11.5%
2306
10.0%
7229
 
7.5%
4225
 
7.4%
6213
 
7.0%
5205
 
6.7%
0194
 
6.4%
9163
 
5.3%
8145
 
4.8%
Other values (25)640
21.0%

Most occurring blocks

ValueCountFrequency (%)
(unknown)3031
100.0%
ValueCountFrequency (%)
(unknown)3047
100.0%

Most frequent character per block

(unknown)
ValueCountFrequency (%)
3372
12.3%
1338
11.2%
2296
9.8%
7247
 
8.1%
4240
 
7.9%
6221
 
7.3%
0201
 
6.6%
5195
 
6.4%
9155
 
5.1%
8146
 
4.8%
Other values (21)620
20.5%
ValueCountFrequency (%)
3377
12.4%
1350
11.5%
2306
10.0%
7229
 
7.5%
4225
 
7.4%
6213
 
7.0%
5205
 
6.7%
0194
 
6.4%
9163
 
5.3%
8145
 
4.8%
Other values (25)640
21.0%

Fare
Real number (ℝ)

 Dataset ADataset B
Distinct179183
Distinct (%)40.1%41.0%
Missing00
Missing (%)0.0%0.0%
Infinite00
Infinite (%)0.0%0.0%
Mean31.3268530.19587
 Dataset ADataset B
Minimum00
Maximum512.3292512.3292
Zeros68
Zeros (%)1.3%1.8%
Negative00
Negative (%)0.0%0.0%
Memory size7.0 KiB7.0 KiB
2025-09-23T16:02:53.513851image/svg+xmlMatplotlib v3.10.0, https://matplotlib.org/

Quantile statistics

 Dataset ADataset B
Minimum00
5-th percentile7.07197.15
Q17.89587.925
median13.515.2458
Q330.9239530.92395
95-th percentile93.586.5
Maximum512.3292512.3292
Range512.3292512.3292
Interquartile range (IQR)23.0281522.99895

Descriptive statistics

 Dataset ADataset B
Standard deviation50.06366144.575585
Coefficient of variation (CV)1.59810711.4762146
Kurtosis40.9326737.472856
Mean31.3268530.19587
Median Absolute Deviation (MAD)6.27087.96455
Skewness5.36633184.998816
Sum13971.77513467.358
Variance2506.37011986.9827
MonotonicityNot monotonicNot monotonic
2025-09-23T16:02:53.653943image/svg+xmlMatplotlib v3.10.0, https://matplotlib.org/
Histogram with fixed size bins (bins=50)
ValueCountFrequency (%)
1324
 
5.4%
7.7524
 
5.4%
8.0520
 
4.5%
7.895817
 
3.8%
10.513
 
2.9%
7.22929
 
2.0%
26.559
 
2.0%
268
 
1.8%
7.9258
 
1.8%
7.057
 
1.6%
Other values (169)307
68.8%
ValueCountFrequency (%)
8.0521
 
4.7%
7.895819
 
4.3%
1317
 
3.8%
2615
 
3.4%
7.7515
 
3.4%
10.511
 
2.5%
7.22510
 
2.2%
7.77510
 
2.2%
7.9258
 
1.8%
26.558
 
1.8%
Other values (173)312
70.0%
ValueCountFrequency (%)
06
1.3%
6.23751
 
0.2%
6.43751
 
0.2%
6.451
 
0.2%
6.752
 
0.4%
6.951
 
0.2%
6.9751
 
0.2%
7.04581
 
0.2%
7.057
1.6%
7.05422
 
0.4%
ValueCountFrequency (%)
08
1.8%
4.01251
 
0.2%
6.23751
 
0.2%
6.451
 
0.2%
6.49581
 
0.2%
6.752
 
0.4%
6.9751
 
0.2%
7.04581
 
0.2%
7.053
 
0.7%
7.05421
 
0.2%
ValueCountFrequency (%)
08
1.8%
4.01251
 
0.2%
6.23751
 
0.2%
6.451
 
0.2%
6.49581
 
0.2%
6.752
 
0.4%
6.9751
 
0.2%
7.04581
 
0.2%
7.053
 
0.7%
7.05421
 
0.2%
ValueCountFrequency (%)
06
1.3%
6.23751
 
0.2%
6.43751
 
0.2%
6.451
 
0.2%
6.752
 
0.4%
6.951
 
0.2%
6.9751
 
0.2%
7.04581
 
0.2%
7.057
1.6%
7.05422
 
0.4%

Cabin
['Text', 'Text']

 Dataset ADataset B
Distinct8882
Distinct (%)89.8%86.3%
Missing348351
Missing (%)78.0%78.7%
Memory size7.0 KiB7.0 KiB
2025-09-23T16:02:54.042311image/svg+xmlMatplotlib v3.10.0, https://matplotlib.org/

Length

 Dataset ADataset B
Max length1515
Median length33
Mean length3.45918373.3473684
Min length11

Characters and Unicode

 Dataset ADataset B
Total characters339318
Distinct characters1919
Distinct categories11 ?
Distinct scripts11 ?
Distinct blocks11 ?
The Unicode Standard assigns character properties to each code point, which can be used to analyse textual variables.

Unique

 Dataset ADataset B
Unique7872 ?
Unique (%)79.6%75.8%

Sample

 Dataset ADataset B
1st rowB35C68
2nd rowC110B18
3rd rowE24E24
4th rowE31E67
5th rowB69F4
ValueCountFrequency (%)
c922
 
1.8%
f42
 
1.8%
e1012
 
1.8%
e442
 
1.8%
c1262
 
1.8%
f332
 
1.8%
c1242
 
1.8%
c1232
 
1.8%
c782
 
1.8%
e332
 
1.8%
Other values (89)90
81.8%
ValueCountFrequency (%)
d3
 
2.8%
c223
 
2.8%
c263
 
2.8%
f23
 
2.8%
b772
 
1.9%
b52
 
1.9%
c22
 
1.9%
e82
 
1.9%
b222
 
1.9%
f42
 
1.9%
Other values (83)84
77.8%
2025-09-23T16:02:54.491500image/svg+xmlMatplotlib v3.10.0, https://matplotlib.org/

Most occurring characters

ValueCountFrequency (%)
C35
10.3%
332
 
9.4%
231
 
9.1%
130
 
8.8%
424
 
7.1%
B24
 
7.1%
623
 
6.8%
518
 
5.3%
718
 
5.3%
E17
 
5.0%
Other values (9)87
25.7%
ValueCountFrequency (%)
240
12.6%
B30
 
9.4%
C29
 
9.1%
624
 
7.5%
821
 
6.6%
521
 
6.6%
120
 
6.3%
D17
 
5.3%
717
 
5.3%
417
 
5.3%
Other values (9)82
25.8%

Most occurring categories

ValueCountFrequency (%)
(unknown)339
100.0%
ValueCountFrequency (%)
(unknown)318
100.0%

Most frequent character per category

(unknown)
ValueCountFrequency (%)
C35
10.3%
332
 
9.4%
231
 
9.1%
130
 
8.8%
424
 
7.1%
B24
 
7.1%
623
 
6.8%
518
 
5.3%
718
 
5.3%
E17
 
5.0%
Other values (9)87
25.7%
ValueCountFrequency (%)
240
12.6%
B30
 
9.4%
C29
 
9.1%
624
 
7.5%
821
 
6.6%
521
 
6.6%
120
 
6.3%
D17
 
5.3%
717
 
5.3%
417
 
5.3%
Other values (9)82
25.8%

Most occurring scripts

ValueCountFrequency (%)
(unknown)339
100.0%
ValueCountFrequency (%)
(unknown)318
100.0%

Most frequent character per script

(unknown)
ValueCountFrequency (%)
C35
10.3%
332
 
9.4%
231
 
9.1%
130
 
8.8%
424
 
7.1%
B24
 
7.1%
623
 
6.8%
518
 
5.3%
718
 
5.3%
E17
 
5.0%
Other values (9)87
25.7%
ValueCountFrequency (%)
240
12.6%
B30
 
9.4%
C29
 
9.1%
624
 
7.5%
821
 
6.6%
521
 
6.6%
120
 
6.3%
D17
 
5.3%
717
 
5.3%
417
 
5.3%
Other values (9)82
25.8%

Most occurring blocks

ValueCountFrequency (%)
(unknown)339
100.0%
ValueCountFrequency (%)
(unknown)318
100.0%

Most frequent character per block

(unknown)
ValueCountFrequency (%)
C35
10.3%
332
 
9.4%
231
 
9.1%
130
 
8.8%
424
 
7.1%
B24
 
7.1%
623
 
6.8%
518
 
5.3%
718
 
5.3%
E17
 
5.0%
Other values (9)87
25.7%
ValueCountFrequency (%)
240
12.6%
B30
 
9.4%
C29
 
9.1%
624
 
7.5%
821
 
6.6%
521
 
6.6%
120
 
6.3%
D17
 
5.3%
717
 
5.3%
417
 
5.3%
Other values (9)82
25.8%

Embarked
Categorical

 Dataset ADataset B
Distinct33
Distinct (%)0.7%0.7%
Missing11
Missing (%)0.2%0.2%
Memory size7.0 KiB7.0 KiB
S
324 
C
76 
Q
45 
S
324 
C
87 
Q
34 

Length

 Dataset ADataset B
Max length11
Median length11
Mean length11
Min length11

Characters and Unicode

 Dataset ADataset B
Total characters445445
Distinct characters33
Distinct categories11 ?
Distinct scripts11 ?
Distinct blocks11 ?
The Unicode Standard assigns character properties to each code point, which can be used to analyse textual variables.

Unique

 Dataset ADataset B
Unique00 ?
Unique (%)0.0%0.0%

Sample

 Dataset ADataset B
1st rowSS
2nd rowCS
3rd rowCQ
4th rowCC
5th rowSS

Common Values

ValueCountFrequency (%)
S324
72.6%
C76
 
17.0%
Q45
 
10.1%
(Missing)1
 
0.2%
ValueCountFrequency (%)
S324
72.6%
C87
 
19.5%
Q34
 
7.6%
(Missing)1
 
0.2%

Length

2025-09-23T16:02:54.571184image/svg+xmlMatplotlib v3.10.0, https://matplotlib.org/
Histogram of lengths of the category

Common Values (Plot)

Dataset A

2025-09-23T16:02:54.618215image/svg+xmlMatplotlib v3.10.0, https://matplotlib.org/

Dataset B

2025-09-23T16:02:54.658627image/svg+xmlMatplotlib v3.10.0, https://matplotlib.org/
ValueCountFrequency (%)
s324
72.8%
c76
 
17.1%
q45
 
10.1%
ValueCountFrequency (%)
s324
72.8%
c87
 
19.6%
q34
 
7.6%

Most occurring characters

ValueCountFrequency (%)
S324
72.8%
C76
 
17.1%
Q45
 
10.1%
ValueCountFrequency (%)
S324
72.8%
C87
 
19.6%
Q34
 
7.6%

Most occurring categories

ValueCountFrequency (%)
(unknown)445
100.0%
ValueCountFrequency (%)
(unknown)445
100.0%

Most frequent character per category

(unknown)
ValueCountFrequency (%)
S324
72.8%
C76
 
17.1%
Q45
 
10.1%
ValueCountFrequency (%)
S324
72.8%
C87
 
19.6%
Q34
 
7.6%

Most occurring scripts

ValueCountFrequency (%)
(unknown)445
100.0%
ValueCountFrequency (%)
(unknown)445
100.0%

Most frequent character per script

(unknown)
ValueCountFrequency (%)
S324
72.8%
C76
 
17.1%
Q45
 
10.1%
ValueCountFrequency (%)
S324
72.8%
C87
 
19.6%
Q34
 
7.6%

Most occurring blocks

ValueCountFrequency (%)
(unknown)445
100.0%
ValueCountFrequency (%)
(unknown)445
100.0%

Most frequent character per block

(unknown)
ValueCountFrequency (%)
S324
72.8%
C76
 
17.1%
Q45
 
10.1%
ValueCountFrequency (%)
S324
72.8%
C87
 
19.6%
Q34
 
7.6%

Interactions

Dataset A

2025-09-23T16:02:47.570974image/svg+xmlMatplotlib v3.10.0, https://matplotlib.org/

Dataset B

2025-09-23T16:02:49.727969image/svg+xmlMatplotlib v3.10.0, https://matplotlib.org/

Dataset A

2025-09-23T16:02:46.206202image/svg+xmlMatplotlib v3.10.0, https://matplotlib.org/

Dataset B

2025-09-23T16:02:48.367859image/svg+xmlMatplotlib v3.10.0, https://matplotlib.org/

Dataset A

2025-09-23T16:02:46.516452image/svg+xmlMatplotlib v3.10.0, https://matplotlib.org/

Dataset B

2025-09-23T16:02:48.665743image/svg+xmlMatplotlib v3.10.0, https://matplotlib.org/

Dataset A

2025-09-23T16:02:46.839201image/svg+xmlMatplotlib v3.10.0, https://matplotlib.org/

Dataset B

2025-09-23T16:02:48.985070image/svg+xmlMatplotlib v3.10.0, https://matplotlib.org/

Dataset A

2025-09-23T16:02:47.259213image/svg+xmlMatplotlib v3.10.0, https://matplotlib.org/

Dataset B

2025-09-23T16:02:49.304729image/svg+xmlMatplotlib v3.10.0, https://matplotlib.org/

Dataset A

2025-09-23T16:02:47.630608image/svg+xmlMatplotlib v3.10.0, https://matplotlib.org/

Dataset B

2025-09-23T16:02:49.783654image/svg+xmlMatplotlib v3.10.0, https://matplotlib.org/

Dataset A

2025-09-23T16:02:46.266449image/svg+xmlMatplotlib v3.10.0, https://matplotlib.org/

Dataset B

2025-09-23T16:02:48.423866image/svg+xmlMatplotlib v3.10.0, https://matplotlib.org/

Dataset A

2025-09-23T16:02:46.578178image/svg+xmlMatplotlib v3.10.0, https://matplotlib.org/

Dataset B

2025-09-23T16:02:48.728562image/svg+xmlMatplotlib v3.10.0, https://matplotlib.org/

Dataset A

2025-09-23T16:02:47.000164image/svg+xmlMatplotlib v3.10.0, https://matplotlib.org/

Dataset B

2025-09-23T16:02:49.046807image/svg+xmlMatplotlib v3.10.0, https://matplotlib.org/

Dataset A

2025-09-23T16:02:47.318399image/svg+xmlMatplotlib v3.10.0, https://matplotlib.org/

Dataset B

2025-09-23T16:02:49.480497image/svg+xmlMatplotlib v3.10.0, https://matplotlib.org/

Dataset A

2025-09-23T16:02:47.694891image/svg+xmlMatplotlib v3.10.0, https://matplotlib.org/

Dataset B

2025-09-23T16:02:49.846400image/svg+xmlMatplotlib v3.10.0, https://matplotlib.org/

Dataset A

2025-09-23T16:02:46.332047image/svg+xmlMatplotlib v3.10.0, https://matplotlib.org/

Dataset B

2025-09-23T16:02:48.487136image/svg+xmlMatplotlib v3.10.0, https://matplotlib.org/

Dataset A

2025-09-23T16:02:46.646668image/svg+xmlMatplotlib v3.10.0, https://matplotlib.org/

Dataset B

2025-09-23T16:02:48.794234image/svg+xmlMatplotlib v3.10.0, https://matplotlib.org/

Dataset A

2025-09-23T16:02:47.062626image/svg+xmlMatplotlib v3.10.0, https://matplotlib.org/

Dataset B

2025-09-23T16:02:49.110569image/svg+xmlMatplotlib v3.10.0, https://matplotlib.org/

Dataset A

2025-09-23T16:02:47.383426image/svg+xmlMatplotlib v3.10.0, https://matplotlib.org/

Dataset B

2025-09-23T16:02:49.544318image/svg+xmlMatplotlib v3.10.0, https://matplotlib.org/

Dataset A

2025-09-23T16:02:47.760368image/svg+xmlMatplotlib v3.10.0, https://matplotlib.org/

Dataset B

2025-09-23T16:02:49.908311image/svg+xmlMatplotlib v3.10.0, https://matplotlib.org/

Dataset A

2025-09-23T16:02:46.395730image/svg+xmlMatplotlib v3.10.0, https://matplotlib.org/

Dataset B

2025-09-23T16:02:48.549063image/svg+xmlMatplotlib v3.10.0, https://matplotlib.org/

Dataset A

2025-09-23T16:02:46.710557image/svg+xmlMatplotlib v3.10.0, https://matplotlib.org/

Dataset B

2025-09-23T16:02:48.857509image/svg+xmlMatplotlib v3.10.0, https://matplotlib.org/

Dataset A

2025-09-23T16:02:47.130630image/svg+xmlMatplotlib v3.10.0, https://matplotlib.org/

Dataset B

2025-09-23T16:02:49.177064image/svg+xmlMatplotlib v3.10.0, https://matplotlib.org/

Dataset A

2025-09-23T16:02:47.449783image/svg+xmlMatplotlib v3.10.0, https://matplotlib.org/

Dataset B

2025-09-23T16:02:49.609000image/svg+xmlMatplotlib v3.10.0, https://matplotlib.org/

Dataset A

2025-09-23T16:02:47.821989image/svg+xmlMatplotlib v3.10.0, https://matplotlib.org/

Dataset B

2025-09-23T16:02:49.969219image/svg+xmlMatplotlib v3.10.0, https://matplotlib.org/

Dataset A

2025-09-23T16:02:46.455834image/svg+xmlMatplotlib v3.10.0, https://matplotlib.org/

Dataset B

2025-09-23T16:02:48.609414image/svg+xmlMatplotlib v3.10.0, https://matplotlib.org/

Dataset A

2025-09-23T16:02:46.775435image/svg+xmlMatplotlib v3.10.0, https://matplotlib.org/

Dataset B

2025-09-23T16:02:48.920683image/svg+xmlMatplotlib v3.10.0, https://matplotlib.org/

Dataset A

2025-09-23T16:02:47.194317image/svg+xmlMatplotlib v3.10.0, https://matplotlib.org/

Dataset B

2025-09-23T16:02:49.242311image/svg+xmlMatplotlib v3.10.0, https://matplotlib.org/

Dataset A

2025-09-23T16:02:47.509548image/svg+xmlMatplotlib v3.10.0, https://matplotlib.org/

Dataset B

2025-09-23T16:02:49.667879image/svg+xmlMatplotlib v3.10.0, https://matplotlib.org/

Correlations

Dataset A

2025-09-23T16:02:54.706547image/svg+xmlMatplotlib v3.10.0, https://matplotlib.org/

Dataset B

2025-09-23T16:02:54.807399image/svg+xmlMatplotlib v3.10.0, https://matplotlib.org/

Dataset A

AgeEmbarkedFareParchPassengerIdPclassSexSibSpSurvived
Age1.0000.0000.174-0.2420.0320.2420.149-0.1680.096
Embarked0.0001.0000.1960.0000.0000.2570.1050.0000.220
Fare0.1740.1961.0000.386-0.0140.4740.2090.4240.252
Parch-0.2420.0000.3861.0000.0460.0000.3230.4770.122
PassengerId0.0320.000-0.0140.0461.0000.0110.083-0.0690.152
Pclass0.2420.2570.4740.0000.0111.0000.0960.1210.300
Sex0.1490.1050.2090.3230.0830.0961.0000.2360.482
SibSp-0.1680.0000.4240.477-0.0690.1210.2361.0000.178
Survived0.0960.2200.2520.1220.1520.3000.4820.1781.000

Dataset B

AgeEmbarkedFareParchPassengerIdPclassSexSibSpSurvived
Age1.0000.1140.093-0.323-0.0030.2530.089-0.2550.227
Embarked0.1141.0000.0910.0160.0000.2210.0000.0780.103
Fare0.0930.0911.0000.426-0.0150.4260.1430.4580.214
Parch-0.3230.0160.4261.0000.0160.0370.3290.5000.213
PassengerId-0.0030.000-0.0150.0161.0000.0000.000-0.0370.144
Pclass0.2530.2210.4260.0370.0001.0000.1090.1460.303
Sex0.0890.0000.1430.3290.0000.1091.0000.1980.474
SibSp-0.2550.0780.4580.500-0.0370.1460.1981.0000.151
Survived0.2270.1030.2140.2130.1440.3030.4740.1511.000

Missing values

Dataset A

2025-09-23T16:02:47.921819image/svg+xmlMatplotlib v3.10.0, https://matplotlib.org/
A simple visualization of nullity by column.

Dataset B

2025-09-23T16:02:50.064552image/svg+xmlMatplotlib v3.10.0, https://matplotlib.org/
A simple visualization of nullity by column.

Dataset A

2025-09-23T16:02:48.003620image/svg+xmlMatplotlib v3.10.0, https://matplotlib.org/
Nullity matrix is a data-dense display which lets you quickly visually pick out patterns in data completion.

Dataset B

2025-09-23T16:02:50.144406image/svg+xmlMatplotlib v3.10.0, https://matplotlib.org/
Nullity matrix is a data-dense display which lets you quickly visually pick out patterns in data completion.

Dataset A

2025-09-23T16:02:48.096504image/svg+xmlMatplotlib v3.10.0, https://matplotlib.org/
The correlation heatmap measures nullity correlation: how strongly the presence or absence of one variable affects the presence of another.

Dataset B

2025-09-23T16:02:50.226295image/svg+xmlMatplotlib v3.10.0, https://matplotlib.org/
The correlation heatmap measures nullity correlation: how strongly the presence or absence of one variable affects the presence of another.

Sample

Dataset A

PassengerIdSurvivedPclassNameSexAgeSibSpParchTicketFareCabinEmbarked
47647702Renouf, Mr. Peter Henrymale34.0103102721.0000NaNS
56856903Doharr, Mr. TannousmaleNaN0026867.2292NaNC
64164211Sagesser, Mlle. Emmafemale24.000PC 1747769.3000B35C
52252303Lahoud, Mr. SarkismaleNaN0026247.2250NaNC
54554601Nicholson, Mr. Arthur Ernestmale64.00069326.0000NaNS
25825911Ward, Miss. Annafemale35.000PC 17755512.3292NaNC
11011101Porter, Mr. Walter Chamberlainmale47.00011046552.0000C110S
52953002Hocking, Mr. Richard Georgemale23.0212910411.5000NaNS
989912Doling, Mrs. John T (Ada Julia Bone)female34.00123191923.0000NaNS
88888903Johnston, Miss. Catherine Helen "Carrie"femaleNaN12W./C. 660723.4500NaNS

Dataset B

PassengerIdSurvivedPclassNameSexAgeSibSpParchTicketFareCabinEmbarked
79179202Gaskell, Mr. Alfredmale16.00023986526.0000NaNS
19119202Carbines, Mr. Williammale19.0002842413.0000NaNS
15615713Gilnagh, Miss. Katherine "Katie"female16.000358517.7333NaNQ
58158211Thayer, Mrs. John Borland (Marian Longstreth Morris)female39.01117421110.8833C68C
34334402Sedgwick, Mr. Charles Frederick Waddingtonmale25.00024436113.0000NaNS
293003Todoroff, Mr. LaliomaleNaN003492167.8958NaNS
88688702Montvila, Rev. Juozasmale27.00021153613.0000NaNS
74774812Sinkkonen, Miss. Annafemale30.00025064813.0000NaNS
989912Doling, Mrs. John T (Ada Julia Bone)female34.00123191923.0000NaNS
54254303Andersson, Miss. Sigrid Elisabethfemale11.04234708231.2750NaNS

Dataset A

PassengerIdSurvivedPclassNameSexAgeSibSpParchTicketFareCabinEmbarked
61962002Gavey, Mr. Lawrencemale26.0003102810.5000NaNS
26626703Panula, Mr. Ernesti Arvidmale16.041310129539.6875NaNS
61561612Herman, Miss. Alicefemale24.01222084565.0000NaNS
56456503Meanwell, Miss. (Marion Ogden)femaleNaN00SOTON/O.Q. 3920878.0500NaNS
28828912Hosono, Mr. Masabumimale42.00023779813.0000NaNS
70670712Kelly, Mrs. Florence "Fannie"female45.00022359613.5000NaNS
12512613Nicola-Yarred, Master. Eliasmale12.010265111.2417NaNC
28028103Duane, Mr. Frankmale65.0003364397.7500NaNQ
51751803Ryan, Mr. PatrickmaleNaN0037111024.1500NaNQ
12312412Webber, Miss. Susanfemale32.5002726713.0000E101S

Dataset B

PassengerIdSurvivedPclassNameSexAgeSibSpParchTicketFareCabinEmbarked
85485502Carter, Mrs. Ernest Courtenay (Lilian Hughes)female44.01024425226.0000NaNS
50650712Quick, Mrs. Frederick Charles (Jane Richards)female33.0022636026.0000NaNS
48048103Goodwin, Master. Harold Victormale9.052CA 214446.9000NaNS
32832913Goldsmith, Mrs. Frank John (Emily Alice Brown)female31.01136329120.5250NaNS
64664703Cor, Mr. Liudevitmale19.0003492317.8958NaNS
36236303Barbara, Mrs. (Catherine David)female45.001269114.4542NaNC
42742812Phillips, Miss. Kate Florence ("Mrs Kate Louise Phillips Marshall")female19.00025065526.0000NaNS
73073111Allen, Miss. Elisabeth Waltonfemale29.00024160211.3375B5S
35235303Elias, Mr. Tannousmale15.01126957.2292NaNC
49849901Allison, Mrs. Hudson J C (Bessie Waldo Daniels)female25.012113781151.5500C22 C26S

Duplicate rows

Dataset A

PassengerIdSurvivedPclassNameSexAgeSibSpParchTicketFareCabinEmbarked# duplicates
Dataset does not contain duplicate rows.

Dataset B

PassengerIdSurvivedPclassNameSexAgeSibSpParchTicketFareCabinEmbarked# duplicates
Dataset does not contain duplicate rows.